翻訳と辞書
Words near each other
・ Thompson Pond (Massachusetts)
・ Thompson Pump and Manufacturing
・ Thompson railway station
・ Thompson Ranch
・ Thompson Recreation and Athletic Centre
・ Thompson Ridge
・ Thompson Ridge, New York
・ Thompson River
・ Thompson River (Missouri)
・ Thompson River (Montana)
・ Thompson Rivers University
・ Thompson Rivers University Faculty of Law
・ Thompson Rivers University, Open Learning
・ Thompson Rivers WolfPack
・ Thompson Samkange
Thompson sampling
・ Thompson School
・ Thompson School (Webster, Massachusetts)
・ Thompson School District R2-J
・ Thompson shell
・ Thompson Site
・ Thompson Sound
・ Thompson Sound, British Columbia
・ Thompson Speedway Motorsports Park
・ Thompson sporadic group
・ Thompson Springs, Utah
・ Thompson Spur
・ Thompson Square
・ Thompson Square (album)
・ Thompson Square (MBTA station)


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Thompson sampling : ウィキペディア英語版
Thompson sampling

In artificial intelligence, Thompson sampling,〔 named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
== Description ==

Consider a set of contexts \mathcal, a set of actions \mathcal, and rewards in \mathbb. In each round, the player obtains a context x \in \mathcal, plays an action a \in \mathcal and receives a reward r \in \mathbb following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards.
The elements of Thompson sampling are as follows:
# a likelihood function P(r|\theta,a,x);
# a set \Theta of parameters \theta of the distribution of r;
# a prior distribution P(\theta) on these parameters;
# past observations triplets \mathcal = \;
# a posterior distribution P(\theta|\mathcal) \propto P(\mathcal|\theta)P(\theta), where P(\mathcal|\theta) is the likelihood function.
Thompson sampling consists in playing the action a^\ast \in \mathcal according to the probability that it maximizes the expected reward, i.e.
:\int \mathbb(= \max_ \mathbb(r|a',x,\theta) ) P(\theta|\mathcal) \, d\theta,
where \mathbb is the indicator function.
In practice, the rule is implemented by sampling, in each round, a parameter \theta^\ast from the posterior P(\theta|\mathcal), and choosing the action a^\ast that maximizes \mathbb(), i.e. the expected reward given the parameter, the action and the current context. Conceptually, this means that the player instantiates his beliefs randomly in each round, and then he acts optimally according to them.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Thompson sampling」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.